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Abstract. We consider the problem of recovering a block (or group) sparse signal from an 
underdetermined set of random linear measurements, which appear in compressed sensing applica- 
tions such as radar and imaging. Recent results of Donoho, Johnstone, and Montanari have shown 
that approximate message passing (AMP) in combination with Stein's shrinkage outperforms group 
LASSO for large block sizes. In this paper, we prove that, for a fixed block size and in the strong 
undersampling regime (i.e., having very few measurements compared to the ambient dimension), 
AMP cannot improve upon group LASSO, thereby complementing the results of Donoho et al. 
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1. Introduction. The field of compressed sensing (CS) aims to recover a sparse 
signal from an undetermined systems of linear equations. Concretely, CS can be 
modeled as y = Ax, where y is the n-dimensional measurement vector, A is the 
(typically random) n x N measurement matrix, and x is an N dimensional vector 
with at most k nonzero entries (often referred to as /c-sparse vector). Our goal is to 
recover x from this undetermined system. 

A large class of signals of interest exhibit additional structure known as block (or 
group) sparsity, where the non-zero coefficients of the signal occur in clusters of 
size B [l][2]. Such block-sparse signals naturally appear in genomics, radar, and 
communication applications. There has been a considerable amount of research on 
theory and algorithms for recovering such signals [ill. Perhaps the most popular re- 



covery algorithm is group LASSO 111 — corresponding theoretical work has shown that 
under which conditions this algorithm recovers the exact group sparse solution [2 -11 



While these results enable a qualitative characterization of the recovery performance 
of group LASSO, they do not provide an accurate performance analysis. 

In order to arrive at a more accurate performance analysis of group LASSO, several 
authors have considered the asymptotic setting where n, N oo, while their ratio S = 
n/N is held constant 12 -16 . Under this setting, the references [l4]-[l6] have shown 



that there exists a threshold on the normalized sparsity p — k/n, below which group 
LASSO recovers the correct signal vector x with probability 1 and fails otherwise. 
Such a phase-transition (PT) analysis has led to the conclusion that group LASSO is 
sub-optimal, since there is a large gap between the PT of the information theoretic 
limitQand that of group LASSO (see Fig. [Ll l. 



There has been recent effort in using approximate message passing (AMP) to improve 
upon the performance of group LASSO. Schnitcr, for example, has experimentally 



shown in 18 that AMP combined with expectation maximization can outperform 
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lr The information theoretic limit was only derived for regular sparse signals (with block size 1) 
[17| . An extension of these results to block sparse signals with larger blocks is straightforward. 
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Fig. 1.1. Phase transition (PT) of group LASSO for various block sizes B. Evidently, there is 
a disparity between the phase transition of group LASSO and the information theoretic limit. 



group LASSO. Kamilov et at have taken the first step toward a theoretical under- 



standing of such algorithms 19 . More recently, Donoho, Johnstone, and Monta- 
nari 16 have shown that AMP is able to outperform group LASSO if it employs 
Stein's shrinkage estimator. In fact, they demonstrate that for very large blocks sizes, 
i.e., for B — > oo, the performance of this AMP variant is close to the information 
theoretic limit. However, in many applications, such as radar and communication 
systems, the group sizes are typically small and for fixed block sizes, Stein's estimator 
does not necessarily improve the performance of AMP. Consequently, the fundamental 
question remains whether AMP can outperform group LASSO for any block size. In 
this paper, we address this question in the high undersampling regime where S — > 0. 

In particular, we show that, for 6 — > 0, there is no nonlinear shrinkage function that 
allows AMP to outperform group LASSO. We emphasize that this result does not 



contradict that in 16 as they considered a different limiting regime, i.e., where the 
block size B approaches infinity. A combination of these two results enables us to 
conclude that, for strong undersampling (i.e., small values of 8), AMP with Stein's 
estimator requires large block sizes in order to outperform group LASSO. 



2. Background. 

2.1. Notation. Lowercase boldface letters, such as v, represent vectors and 
uppercase letters, such as V, represent matrices; lowercase letters, such as v represent 
scalars. We analyze the recovery of a block (or group) sparse signal x € 1^ with at 
most k nonzero entries from the undersampled linear measurements y = Ax, where 
A € R nxJV is i.i.d. zero-mean Gaussian with unit variance. We furthermore consider 
the asymptotic setting where 5 = n/N , p — k/n, and N, n, k — > oo. 

The notational conventions for block sparse signals are as follows. We assume that 
all the blocks have the same size, denoted by B. Extensions to signals with varying 
block sizes is straightforward. The signal x is partitioned into M blocks where clearly, 
N = MB. In the remainder of the paper, we will denote x^ as a particular block. 
Suppose that the elements of are drawn from a given distribution F(jx.b) = (1 — 
e) <5o ( 1 1 x s 1 1 2 ) + £ G(xb), where e = pS, and <5o is the Dirac delta function; G is a 
probability distribution that is typically unknown in practice. 
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The block soft-thresholding function used in this paper is defined as follows 16 

^ oft (y^) = ^^(llys|| 2 -r) + . 



iiysi 



(2.1) 



Here, (z)+ = max(z,0) and f; (ys;r) sets its argument ys to zero if ||ys|| 2 < T > 
and shrinks the vector ys towards the origin by t, otherwise. 



2.2. Group LASSO and approximate message passing (AMP). A decade 
of research in sparse recovery has produced a plethora of algorithms for recovering 
block sparse signals from random linear measurements. Two popular algorithms are 
group LASSO and AMP. Group LASSO searches for a vector x that minimizes the 
cost function, x L = argminxj^g^ ||xb|| 2 : y = Ax.}. AMP, on the other hand, is an 
iterative algorithm to recover the solution vector x. Concretely, by initializing x° = 
and z° = 0, AMP iteratively performs the following steps: 



x 



t+1 = r;(x* + A*x t ) and z* = y - Ax* + c*. (2.2) 



Here, c is a correction term that depends on the previous iterations, which signifi- 
cantly improves the convergence of AMP; x* is the (block) sparse estimate at iteration 
t, and ?7 is a nonlinear function that imposes (block) sparsity. In particular, if n(-) 



is the block soft-thresholding function as in (2.1), then AMP is equivalent to group 



LASSO in the asymptotic setting 15 16 (see [13] for the details) 



One of the most appealing features of the AMP is that its operation can be viewed 
as a denoising problem at each iteration. That is, when N — > oo, x* 4- A*z l can be 
modeled as the sparse signal x plus zero-mean Gaussian noise, which is independent 
of the signal. This feature enables one to analytically predict the performance of 
AMP through a framework called state evolution (SE). Concretely, if the mean-square 
error (MSE) of AMP at iteration t is denoted by MSE*, then 

MSE t+1 = ^E{||x B - t/' (x B + VMSE'zs)!!'}, (2.3) 

where zb ~ N(0, Ib) and the distribution of x# is the same as the empirical distribu- 
tion of the blocks of the original vector x. The expectation E{-} is taken with respect 
to the vectors zg and x^. 



2.3. Phase transition. The performance of CS recovery algorithms can be char- 
acterized accurately by their phase transition (PT) behavior. Specifically we define a 
two-dimensional phase space (S, p) G [0, 1] that is partitioned into two regions: "suc- 
cess" and "failure", with these regions separated by the PT curve (<5, p(<5) ) . For the 
same value of 8, algorithms with higher PT outperform algorithms with lower PT, 
i.e., guarantee the exact recovery for more nonzero entries k. 



3. Main results. The thresholding function rf determines the performance of 
AMP. Indeed, different choice s of rf may lead to fundamentally different performance. 

that if 77 soft fr om (|2.1|) is used, then the performance 
Since 77 soft is not necessarily the 



It has been shown in 



15 



16 



of AMP is equivalent to that of group LASSO, 
optimal thresholding function for group sparse signals, finding the optimal function 
is of significant practical interest. 
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In this paper, we characterize the optimal choice of the thresholding function rf in 
the strong undersampling regime, i.e., for 5 — > 0. Before we proceed, let us define 
optimality. Suppose that each block xg is drawn independently from the distribution 



F(xb) as defined in Section 2.1 We furthermore assume that rf is applied to each 



block separately. As is evident from (2.3), each iteration of AMP is equivalent to 



a denoising problem, where the noise variance is equal to the MSE of the previous 
iteration. Therefore for a given initialization point, the effect of the thresholding 
function rf on AMP can be characterized by a discrete set {MSE*}^ 1 . Thus, instead 
of considering a sequence of iteration-dependent thresholding functions {rj t }^ 1 , we 
can consider a sequence of thresholding functions fj that depend on MSE where 
fj : M. B xl-) M. B with the propery rf{y B ) = rj(y B , MSE*). Since the MSE sequence 
is dependent on the initialization point of AMP (which can be chosen arbitrarily), we 
wish to optimize fj with respect to all possible initializations. Hence at each iteration, 
the problem is simplified to finding the optimal fj which is a function of y B and any 
MSE value greater than zero. Now, suppose that the PT of AMP with fj is given 
by p n (S, G). Then, we are interested in thresholding functions that achieve: 

p*(S) = supinfp^G). (3.1) 

rj G 

Such an fj provides the best PT performance for the least favorable distribution — a 
reasonable assumption, since the distribution G is typically unknown in many practi- 
cal applications. Our first contribution characterizes the behavior of p*(6) for strong 
undersampling, i.e., for small values of 5. 

Theorem 3.1. The optimal PT p* (5) of AMP follows the behavior 

As shown in Section [4~2| this behavior is determined when G is uniformly distributed 
on a sphere with infinite radius. We note that if the distribution G is unknown, 



this theorem does not provide any guidelines on how to choose rf in (2.2 1. The next 
theorem shows that group LASSO follows exactly the same behavior, and hence, block 
soft thresholding is the optimal choice for rf in the strong undersampling regime. 

Theorem 3.2. The PT p L {8) of group LASSO follows the behavior 



Combining Theorems |3.1| and |3.2| reveals that for a fixed B and in the strong un- 
dersampling regime, i.e., for <5 — > 0, the best achievable PT of AMP coincides with 
the phase transition of group LASSO. This result has two striking implications for 
strong undersampling and fixed block sizes: (i) Block soft thresholding is optimal 
and (ii) there is no thresholding function for AMP that outperforms group LASSO. 
Consequently, for decreasing 5, AMP equipped with a better thresholding operator 
(than block soft-thresholding) such as Stein's shrinkage requires larger block sizes to 
outperform group LASSO in this regime. It is worth mentioning that these results do 
not contradict those of [l6l Section 3.2], which show that AMP with Stein's shrinkage 
outperforms group LASSO for large block sizes. In fact, combining our results with 
those in |16 provides a better picture of the potential benefits of Stein's shrinkage 
within AMP. 
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4. Proofs of the main results. Wc outline the proofs of Theorems |3 . 2| and |3. 1 [ 

Since the proof of Theorem 3.1 requires the result of Theorem 3.2, we begin by proving 
the latter. 



4.1. Proof of Theorem |3.2[ We start by deriving an implicit formula for the 
PT of group LASSO. We further use this formula and Laplace's method to obtain the 
behavior of the phase transition in the strong undersampling regime. 

4.1.1. Implicit formula for the phase transition of group LASSO. Since 
the performance of group-LASSO has been shown to be equivalent to AMP with 
block soft thresholding, we can use the state evolution formalism to obtain the phase 
transition for group LASSO. Due to the properties of SE (2.3), for any MSE value 
and p(S) below the PT, the following holds: 

-^E { 1 1 x B - rf oft (x fl + VMSEzb ; r VMSE) 1 1 \ } < MSE. (4.1) 
To ensure that (4.1) is satisfied while achieving the optimal phase transition with 



respect to r, we use the minimax MSE Mb, which corresponds to 16 



Mb = ^inf supfij ||x b - ?? 8oft (x B + z B ; T )\\l } (4.2) 



in the asymptotic setting. Donoho et al. showed in 16 
M 



1 r r°° 1 

-inf{ e (B + r 2 ) + (l-e) / {y/i - rf- w —xi "V* dx 

D t l J T 2 2'aT(y) 



One can rigorously show that the optimal phase transition obeys Mb = 6 13 . Using 
simple calculus, we obtain, p L (<5) as follows. 

Lemma 4.1. The phase transition for group LASSO is given by: 

where f(x) is the probability density function o/r(y,|) and the optimal threshold 
parameter, t* , satisfies 

s = -(B + r* 2 ) p 2 (r* - ^)f{x)dx + r* J™ - r* ff{x)dx 

Br*-Bp 2 (r*-^c)f(x)dx ' l ' j 

It is important to note that p L (S) is independent of distribution G (proof is omitted 
in this paper as it is a mere extension of the one provided in |15| ). 

4.1.2. Phase transition behavior for 5 — > 0. We are interested in observing 
the behavior of group LASSO for 6 — > 0. Intuitively, for such regime, a very sparse 
signal is recovered since e — > (e = p5). From the definition of block soft thresholding, 
r* must be large to promote sparse recovery. Using this knowledge, the integrals 
in (4.4) and (4.3) can be approximated via Laplace's method. One such integral is 

/oo 
(r* - y/ X ) X %~ 1 e~ : %dx. 



() 



We begin by letting y = —t* + y/x. As a result, the expression for I\ is simplified to 

^ 2 Z" 00 2 



Suppose we break the integral into two parts: 

.2 fl 2 ,2 Z" 00 2 

h = -2e- T -^ y(y + T*) B - 1 e-^e- T * y dy~2e- T ^ y(y + r^^e'^ e- T * v dy, 
Jo Jj 

where 7 is close to zero. Due to the decaying exponential in the integrand, the second 

T *2 _ , 

integral, which is of the order 0(e ~e~ T 7 ), is much smaller than the first integral. 

Thus, we approximate 1% by the first integral. Denote g(y) = y(y + r*) B ~ 1 e~T . 
Since 7 is small, g(y) can be well approximated with a first order taylor expansion 
around y = 0. This yields 

h = -2e-^ N \(t*) b " V T * y dy + 0(t* B_5 )^ ~ -2e _z ^ (r*) 5 " 3 , 

where the second approximation is obtained via integration by parts. The other 
integral I2 in (4.4 1 that we wish to approximate is given by 

(-• r* + Va*) 2 ^ -1 ^^^. 

*2 

Using similar techniques, we find I2 ~ 4e _Z 2~ (t*) s ~ 4 . Substituting the approxima- 
tions of 7i and I2 in (4.3), we note that the numerator is dominated by the term B5, 
and the denominator by St* 2 . With this behavior, we obtain 

P L (S) ~ ^, (4.5) 

as t* —¥ 00. The expression for <5 is similarly derived to be 

2(B + T* 2 )e- T -^ {t*) b - 3 + 4r*e-^ i (t*) b - 4 
BT*h{B)+2Bh{B)e- z ¥{T*) B - :i 

where = 2?r(§ ). Since r* -> 00, 

5-- (4-6) 



From (4.6 1, it is clear that for large t* , log(<5 1 ) ~ r* /2. Combining this with (4.5| 
yields the desired result. 



4.2. Proof of Theorem 3.1. We now derive an expression for p*(8) when 
5—^0. By definition, 

M B = 4 inl supfij ||x B - ri(xB + z B )\\l \ = -= supinf e( ||x b - t}(xb + %b)\\1 }> 
B n g ' B q v <■ J 
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where drawn from F(x.b) as denned in Section 2.1 Denning ys = + zb, the 
Bayes estimator, E[xs|y_B] minimizes the risk for every G. Thus 



A^(G) = ^E{||x B -E[x B |y B ]||"}. 



Consider F(xb) = (1 — e)<5o ( H^s || 2) + eG*(xs), where G* is a distribution that is 
uniform on a sphere with radius /1. We relate fj, to e as follows 



(1-7) log (^=4 Ml^) 



= -an, 



(4.7) 



where 7 is chosen such that ase— >0,/x,a— >-oo. One such choice could be 



6 ~ log 



1 - e 



-1/4 



Before we proceed to find the risk function associated with G* , we will develop an 
intuition on the behavior of the Bayes estimator. 

Define G = (j,9b, where 9b is a distribution that is uniform on a sphere with unit 
radius. Denote xb(G*) as the Bayes estimator for G* . The k-th element of the Bayes 
estimator, xrg u, is approximately given by 



£[B,fc](G* 



e J MVfeie-^-^^d^B) 



(l-eje-^ +e re-^llyB-^B^do-^s) 



(4.8) 



where der^s) denotes the Haar measure on the unit sphere. Using the conditions in 
(4.7), we can rewrite (4.8) as follows: 



t[.B,fe](G*) 



/ V0[B,k] e 



^{yB,S B )-^ -an 



1 + J ef 1 {yB,9B)-H 2 -a f id(T(6 B ) 



(4.9) 



We wish to find an approximation for xreju. To that end, we approximate the inte- 
grals in (4.9) using Laplace's method. This is achieved by determining the 6* b that 
maximizes the exponent of the integrand under the constraint that ||#b|| 2 = 1- The 
method of Lagrange multipliers yields 



? B) y s )-7(E7=i^,i]-i) 



0. 



The exponents in both integrals are maximized if 9* b = «yg\\ '• Thus, (4.9) is ap- 
proximated by 



X[B,fc](G*) 



M^*[B.fe]e A ' l|yf3 " 2_Al2_ap 







1 _|_ e Ml|yB|| 2 -M '-an 

Thus, we see that xg(G*), is approximately given by 





|yj3 ll 2 



|ys|| 2 < A 4 + a 

\Yb\U > M + a 



x B (G*) 



llyella 



ys|| 2 < A* + a 
ys|| 2 > A* + a - 



(4.10) 



(4.11) 



Therefore, very interestingly, the Bayes estimator for this distribution behaves simil- 
iarly to a hard thresholding function. 
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In the following lemmas, we further characterize the Bayes estimator and the risk 
function associated with G* . These results will lead us to the behavior of the phase 
transition for this distribution. 

Lemma 4.2. The Bayes estimator associated with G* , x B (G*), behaves as ||x B (G*)|| 2 < 
Proof. Suppose that ||x B (G*)|| 2 > [i. Define the estimator, Wg, as: 



w B 



x B (G*) ||x B (G*)|| 2 < n 
||x B (G*)|| 2>/ . 



|*b(G*) 



By definition, ||w B || 2 < ||x B (G*)|| 2 . The risk associated with estimator w B is given 
by: 

Mw B (G*) = (l-e)E||w B ||a 

+ eE{ ||w b - fx8 B \\ 2 2 J ||x B (G*)|| 2 > /i}p{ ||x B (G*)|| 2 > M } 

+ eE{ ||w b - tf B \\l J ||x B (G*)|| 2 < ,i}p{ ||x B (G*)|| 2 < M } 

By geometric reasoning, it is easy to see that M^ B (G*) < M B (G*) This however is 
a contradiction since the Bayes estimator, x B (G*), is the optimal estimator for the 
distribution G*. Thus, we conclude that ||x B || 2 < fi. □ 

Before proceeding to finding the risk associated with G*, we provide the following 
useful lemma. 



Lemma 4.3. Let y B = /i# B + z B . If fi and a satisfy (4-1), then 
P{||y B || 2 > M + a} < ■!< : 



Proof. Since z B <~ N(0,I B ), ||z B | and ^b\\ 2 arc independent. Furthermore, ^b\\ 2 nas 
a uniform (Haar) distribution on the unit sphere in R B . Therefore, p| 1 1 y b 1 1 2 > A 4 + a | 
does not depend on 9 B and hence we set 63 = (1,0, ... , 0). 



s {||y B || 2 >M + a} 



+ 



(m,0,...,0)+z b || 2 > (M + a) 2 } 

1 1 2 2 1 

z B || 2 + 2/zZ[ Bjl ] > a +2a[i> 
z B || 2 + 2^iZ[ B:1 ] > a 2 + 2a[i 
z B || 2 + 2[iZ[ B1 ] > a 2 + 2afi 
z [B,i] \ > a} 



Z[ B ,i] 
Z[s,i] 



a}p| |z [B)1] | > aj 
a}p{ |z [Bjl] | < a} 



z B || 2 + 2^iZ[ B:1 ] > a 2 + 2afi 



|z[ B ,i]| < a}lP{ | z [B,i]| < a Y 
(4.12) 

Using standard bounds on the tail of a Gaussian random variable we obtain, 

Pj |z [Bjl] | > aj < 2e"TT (4.13) 
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< 
< 



Z [B,1]| < a } P { \ Z [BS] \ •' » 
< a } P { | Z [B,1]I " ' ; 



} 



Furthermore, 

P j ||z B || 2 + 2/xz [B>1] > a 2 + 2a^ 

E {||z B ||*>a 2 

°{l|zB|l2>a 2 } 
< e -(|-^)^(^)' f (4.14) 

The technique to obtain the last inequality can be found in (20l Section 5] . Combining 



(4.12), (4.131, and (4.141, we obtain the desired result. □ 



In the following lemma, we characterize the risk function associated with G 

Lemma 4.4. Suppose ~ F(xb)- Denote the associated risk function as Mg(G*) 

l 

2 



Let 7 > 0, where 7 can be made arbitrarily small. Set (1 — 7) log (— 

0. 



1 /i 2 . Then, 



Mb {G* ) ~ e 2(1 n 7) log f - — -\ as , 



Proof. By definition, the risk is equal to 

Mb{G*) = Je{ ||/z0 b - E[x B |y B ]|| 2 } + [| - E[x B |z B ]|| 2 }, 



MUG") 



M|(G*) 



where y B = /x# B + z B . The expectation of M B (G*) is with respect to z B ~ AT(0, 7 B ) 
as well as x B ~ G* . The expectation of A7|(G*) is with respect to z B . We begin by 
computing M B (G*). Applying the conditions in (4.11), M B (G*) can be expressed as 



M X B {G*) =^E{|| M 0B-E[x B |y B ] 



|y B || 2 < A* + a — a 



w 



|ys|| 2 < m 



a — a| 



-E 



{||^ B -E[x B |y B ] 



\7b\ 



> 



a-a}p{ 



Yb| 



> 



a — a| . 



Mi"(G*) 

(4.15) 

where a is a constant that can be made arbitrarily small. We first begin by bounding 
M B **(G*). Using Lemma [4^1 we note 



Therefore, 



\tW B - E[x B |y B ]|| 2 < ||^ B || 2 + ||E[x B |y B ]|| 2 < 2 M . 



M i.* (G » ) <^5 P |||y B || 2 > A4 + _ o j 



(4.16) 



Next, we consider M B *. Recall that the k-th element of Bayes estimator E[x B |y B ] 
associated with the distribution G* is given by (4.8). Then, given ||y B || 2 < fi + a — a, 
we have 



10 



kB,k]( G *)\ 



1 + / e^ B ^B)-^- a ^da{e' B ) 



< / M 



a ^{yB,e B )-n -an 



da(9 B ) 



To provide a further bound, the exponent can be maximized by letting 6 B 
Moreover, |#r Bfc i| < Thus, 



ye 



2 -otfj. 



v\0[B,k] \ ~M e af " < -E[x[ B)fe ]|y [B)fc ]]| < MHB,fc]| +A« 

Since a can be chosen arbitrarily, we let a — > in a manner such that afi —¥ oo. 
Therefore, 



M 



i*(G») = ±e{ || M b - E[x B |y B ]||a | ||yfl|| a < + « ~ «} P { llysll 2 < M + a - a} 
E{||/i0 B ||^}p{||y s || 2 <n + a-a} 

{l|ys|| 2 <M + a-«} (4-17) 



e 

B" 
B 



Combining ( |4.15| ),(|4.16[), ( |4.17[ ), and the fact that M B {G*) = M B *(G*) + M B **{G*) 
yields 

M B (<T) < ^p{ ||y B || 2 < ,i + a - a} + -^p{ ||y B || 2 > M + a - a}. 
After further manipulation, we obtain 

< ^p{l|y B || 2 >^ + «-"} 

B \ \a — a J ) 



< 



Since a — > 0, and a — > oo, we conclude that M B {G*) ~ 
Ml(G*): 



We next characterize 



Mj|(G*) = ^e{||e[x b |z b ]|Q 
{||e[x b |z b ] 



< 



B 

1 - e 



z B \\2 > H + a - & \P\ ||z B ||2 >/i + a — o; 



} 

| E[x B |z B ] ||z B || 2 < At + a - a|p|||z B || 2 < /Lt + a - aj 



»{|| ZB || 2 > At + a-a} + ^(e-^) 



It is straightforward to see that this term is also negligible compared to and 



therefore M B (G*) = M B (G*) + M%{G*) ~ Combining this with gives the 



desired result. □ 

The PT for the distribution F(x B ) is found by letting M B {G*) = 8 to obtain 
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Since 



p{5) ~ B/(2(l- 7 )log(i^)) 
we have p(8) < B/(21og(|)). From (|3.1|) 
Moreover, recall from Theorem 
distribution G. By definition of optimality. 
this with the upper bound for p 



> and 7 can be made arbitrarily small, 
we know that p*{5) < p{8) < S/(21og(|)). 
~ £?/(21og(|)) and is independent of the 
P*(S) > P L (S) ~ 5/(21og(i)). Comparing 
(5), we conclude that p*(5) ~ B/(21og(|)). 
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